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Abstract 

In  this  paper,  the  Basic  Bidirectional  Associative  Memory  (BAM)  is  extended  by  choosing  weights 
in  the  correlation  matrix,  for  a  given  set  of  training  pairs,  which  result  in  a  maximum  noise  tolerance  set 
for  BAM.  This  optimized  BAM  will  recall  the  correct  training  pair  if  an  input  pair  is  within  the  maximum 
noise  tolerance  set.  We  define  a  hyper-radius,  and  we  prove  that  for  a  given  set  of  training  pairs,  the 
maximum  noise  tolerance  set  is  the  largest,  in  the  sense  that  at  least  one  pair  outside  the  maximum 
noise  tolerance  set,  and  within  a  Hamming  distance  one  larger  than  the  hyper-radius  associated  with 
the  maximum  noise  tolerance  set,  will  not  converge  to  the  correct  training  pair.  A  standard  Genetic 
Algorithm  (GA)  is  used  to  calculate  the  weights  to  maximize  the  objective  function  which  generates  a 
maximum  tolerance  set  for  BAM.  Computer  simulations  are  presented  to  illustrate  the  error  correction 
and  fault  tolerance  properties  of  the  optimized  BAM. 
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I.  Introduction 

In  1968,  Anderson  [6]  proposed  a  memory  structure  named  Linear  Associative  Memory 
(LAM),  which  can  be  used  in  hetero-associative  pattern  recognition.  Since  LAM  is  noise  sensi¬ 
tive,  Optimal  Linear  Associative  Memory  was  introduced  by  Wee  [7]  and  Kohonen  [8],  which 
extended  the  LAM  by  absorbing  the  noise.  Although  good  results  can  be  obtained  using  these 
early  approaches,  many  theoretical  and  practical  issues  such  as  network  stability  and  storage 
capacity  were  still  unresolved.  In  1988,  Kosko  [1]  presented  the  theory  of  bidirectional  associative 
memory  by  generalizing  the  Hopfield  network  model. 

As  a  class  of  artificial  neural  networks,  Bidirectional  Associative  Memories  (BAM)  provide 
massive  parallelism,  high  error  correction  and  high  fault  tolerance  ability.  However,  to  form  a 
good  BAM,  a  good  encoding  strategy  was  required.  This  field  has  received  extensive  attention 
from  researchers  and  a  substantial  effort  has  been  devoted  to  various  learning  rules.  Kosko  [1] 
has  provided  a  correlation  learning  strategy  and  proved  that  the  BAM  process  will  converge  after 
a  finite  number  of  interactions.  However,  the  correlation  matrix  used  by  Kosko  cannot  guarantee 
that  the  energy  of  any  training  pair  is  a  local  minimum.  That  is,  it  can  not  guarantee  recall  of 
any  training  pair  even  for  a  very  small  set  of  training  data. 

During  the  following  years,  various  encoding  strategies  and  learning  rules  were  proposed  to 
improve  the  capacity  and  the  performance  of  BAM.  In  1990,  Wang,  Cruz,  and  Mulligan  [2] 
introduced  two  BAM  encoding  schemes  to  increase  the  recall  performance  with  a  trade  off  of 
more  neurons.  These  are  multiple  training  methods,  which  guarantee  the  recall  of  all  training  pairs 
[3].  In  1993  and  1994,  Leung  [9]  [10]  present  the  Enhanced  Householder  Encoding  Algorithm 
(EHCA),  which  was  improved  by  Lenze  [11]  in  2001,  to  enlarge  the  capacity.  In  1995,  Wang 
and  Don  [12]  introduced  the  exponential  bidirectional  associative  memory  (eBAM),  which  uses 
an  exponential  encoding  rule  rather  than  the  correlation  scheme. 

However,  these  methods  have  focused  on  the  training  set  or  capacity  only.  The  noisy  neighbor 
pairs  and  the  noise  tolerance  set  of  BAM  have  been  ignored.  In  this  paper,  we  are  especially 
interested  in  the  approach  proposed  by  Wang,  Cruz,  and  Mulligan  [2]  [3]  and  extend  the  BAM 
by  choosing  the  weights  for  training  pairs  in  the  BAM  correlation  matrix,  which  can  maximize 
the  noise  tolerance  set,  for  a  given  set  of  training  pairs,  such  that  any  noisy  input  pair  within 
the  tolerance  set  will  converge  to  the  correct  training  pair. 
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Some  basic  concepts  of  BAM  are  reviewed  in  Section  II.  Then,  the  multiple  training  concept 
is  extended  in  Section  III  with  the  optimization-based  encoding  strategy  for  constructing  the 
correlation  matrix.  Two  lemmas  and  a  theorem  about  the  new  encoding  rule  are  proved  in  the 
same  section.  These  provide  the  foundation  for  constructing  the  maximum  noise  tolerance  set.  We 
present  a  numerical  example  in  Section  IV  to  illustrate  the  effectiveness  of  the  extended  BAM. 
In  this  example,  a  standard  GA  is  used  to  resolve  the  nonlinear  optimal  problem  and  obtain  the 
optimum  training  weights.  Finally,  we  draw  conclusions  and  enumerate  some  possible  future 
extensions  in  Section  V. 


II.  Bidirectional  Associative  Memory 

BAM  is  a  two-layer  hetero-associative  feedback  neural  network  model  first  introduced  by 
Kosko  [1].  As  shown  in  Fig.  1,  the  input  layer  LA  includes  n  binary  valued  neurons  (ai,  a2, . . . ,  an) 
and  the  output  layer  LB  comprises  m  binary  valued  components  (6i,  . . . ,  bm).  Now  we  have 

La  =  {0.1}n  and  LB  =  (0, 1 } .  BAM  can  be  denoted  as  a  bi-directional  mapping  in  vector 
space  M  :  Rn  Rm.  The  training  pairs  can  be  stored  in  the  correlation  matrix  as  follows: 


Fig.  1.  Structure  of  Bidirectional  Associative  Memory 


N 

m  =  '£xJy, 


2=1 
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where  Xi  and  Yt  are  the  bipolar  mode  of  At  and  Bi  respectively,  i.e. 

’  Xi  =  2A  -  1 
Yi  =  2B,  -  1 

V 

If  inputs  Xi,X2, ,  XN  are  orthogonal  to  each  other,  i.e. 


XiXJ 


T 


1  ,  i  =  j 
0  ,  i  +  j 


then, 

N  N 

XlM  =  Xi(J2xjYj)=xix7Y+  E  xiX]Y. )  =  Yi 

j= i  j=ij& 

To  obtain  higher  accuracy  for  associative  memory  and  retrieve  one  of  the  nearest  training 
inputs,  the  output  Y  can  be  fed  back  to  BAM.  Starting  with  a  pair  (a0,  /?0),  determine  a  sequence 
(q'2,  A),  •••  ,  until  it  finally  converges  to  an  equilibrium  point  (op.  ftp).  If  BAM 
converges  for  every  training  pair,  M  is  said  to  be  bidirectional  stable. 

The  sequence  can  be  obtained  as  follows: 


[/^i+l]fc 


i  ,  [djiirj/j  >  £k 

<  \Pi]k  ,  [oiiM]k  =  £k 

1  ,  [OjilTjj;  <  £f~ 


1  ,  [PiM]k  >  8k 

<  [ ai\k  >  \PiM]k  =  8k 
— 1  ,  [PiM]k  <  8k 


where  [•]fcis  the  kth  element  of  the  vector.  £k  and  8k  and  are  two  thresholds  for  the  kth 

element  of  a*  and  fit  respectively.  If  (s,5)T  =  (£i,£2,  ■  ■  ■  ,£n,8i,  82,  ■  ■  ■ ,  SnV  =  0,  then  this 

kind  of  BAM  is  called  homogeneous.  Others  are  called  non-homogeneous  BAM. 

For  each  pair  ,  the  Lyapunov  or  energy  function  is  defined  as, 

—aMf3T  ,  (e,  5)T  =  0 

—aMf3T  +  a£T  +  /38T  ,  (e,  5)T  ^  0 


E=  I 


Kosko  [1]  and  Haines  et  al.  [4]  have  proved  that  after  a  finite  number  of  iterations,  E  converges 
to  a  local  minimum,  where  the  corresponding  pair  ( aF,/3F )  is  a  stable  point. 

McEliece  et  al.  [5]  have  shown  that  if  the  training  pairs  are  even  coded  (±1  with  probability 
0.5)  and  n-dimensional,  the  storage  capacity  of  the  homogeneous  BAM  is  XX.  That  means,  if 

z  iog2 

L  even-coded  stable  states  are  chosen  uniformly  at  random,  the  maximum  value  of  L  in  order 
that  most  of  the  L  original  vectors  are  accurately  recalled  is  2  . 
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For  the  non-homogeneous  BAM,  Haines  and  Hecht-Nielsen  [4]  have  pointed  out  that  the 
possible  number  of  the  stable  states  is  between  1  and  2mm(m,nl  However,  since  these  stable 
states  are  chosen  in  a  rigid  geometrical  procedure,  the  storage  capacity  of  the  non-homogeneous 
BAM  is  less  than  the  maximum  number.  Haines  and  Hecht-Nielsen  [4]  also  have  shown  that  for 
N  same  dimensional  and  uniformly  randomly  chosen  training  pairs  with  (4+log2)  exactly  entries 
equal  to  +1  and  (n  —  4  —  log^)  entries  equal  to  —1,  if  Ar  <  >  then  a  non-homogeneous 

BAM  can  be  constructed  so  that  approximately  98%  of  these  chosen  pairs  can  be  stable  states. 


III.  Encoding  Strategy  for  BAM  with  Maximum  Noise  Tolerance  Set 


In  this  new  enhanced  model,  we  start  with  a  weighted  learning  rule  of  BAM  similar  to  the 
Multiple  Training  Strategy  in  [3].  For  a  given  set  of  training  pairs  {(A%  T^)}^,  the  weighted 
correlation  matrix  is 


M  =  YJwiX]Yi  (1) 

2=1 

where, 


Y-i  7  *  *  *  ?  iq ) 

Yi  =  {ViuVa,  •••  ,Vip) 


Q  and  P  are  the  lengths  of  the  input  and  output  patterns  respectively.  W  =  (wi,w2,  ■  ■  • ,  wjv)  is 
the  vector  of  training  weights.  In  [3],  necessary  and  sufficient  conditions  are  derived  for  choosing 
W  such  that  each  training  pair  can  be  recalled  correctly. 

The  energy  of  a  training  pair  (A%  Y%)  is  defined  as 

E(Xi,Yl,M)  =  -XlMYlT  (2) 


If  the  energy  of  one  training  pair  is  lower  than  all  its  neighbors  with  one  Hamming  distance 
away  from  it,  then  the  training  pair  can  be  recalled  correctly. 

The  neighbor  pairs  with  n  E  I  Hamming  distance  away  from  a  pair  (A%  %)is  defined  as 


n(Xl,Yl,n)  =  { 


{(X,Y)\Hx(Xi,X)+Hy(Yi,Y)  =  n}  , 

(XuYi) 

where  Hx(Xl,  X)is  the  Hamming  distance  between  layers  Xt  and  X, 
Hamming  distance  between  layers  Y]  and  Y. 


n  >  0 
n  <  0 

and  Hy(yi,y )  is  the 
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Lemma  1:  If  a  training  weight  vector  W  =  [w\,  w2,  •  •  • ,  tcn]T  satisfies 

[ri,r2,---,riv]Tw/  >  o 


where, 


„A1  .  .  .  „A1 

Vil  ViN 


?,f  =  AX] YjBj  -  A$XJ YjBj 


AQ  AQ 

Vn  •  •  •  ViN 


vSk  =  AXJ YjBj  -  AtXj YfiB*) 


A^(Bj)  differs  form  A,  ( B, )  only  in  the  k-th  bit 


Then,  3\1/  £  I+,  such  that  any  pair  ( X ,  Y)  £  |J  ^(Xi,  Yi, «),  1  <  n  <  T  has  higher  energy  than 

2=1 

any  pair  (X',  Y')  £  Y,  1)  f|  [  U  Yi;  ra  -  1)  . 

Proof:  Wang,  Cruz,  and  Mulligan  [2]  have  proved  that  if  a  training  weight  vector  W  satisfies 
condition  (3),  then  all  training  pairs  can  be  recalled  correctly.  Since  a  training  pair  Pi  can  be 
recalled  correctly  if  and  only  if  Pi  is  a  local  minimum  on  the  energy  surface,  any  pair  (X,  Y)  £ 

U  Q(Xi,  Yi ,  1)  has  higher  energy  than  any  pair  (X',  Y')  £  f2(X,  Y,  1)  f|  U  ^(X*,  Yt,  0)  .  So, 

2=1  L  2=1 

N 

at  least  3T  =  1  satisfying  that  any  pair  (X,  Y)  £  (J  fi(X,,  Yi,n),l  <  n  <  T  has  higher  energy 

2=1 

r  at  i 

than  any  pair  (X',  Y')  £  0(X,  Y,  1)  f|  j  f2(Xj,  Yi,  n  —  1)  .  ■ 

L  2=1 


Definition  P.  For  a  BAM(W,  M)  satisfying  condition  (3),  we  define  the  maximum  T  as  the 
energy  well  hyper-radius  F  which  satisfies  the  following: 

1)  F  £  1+ 

N 

2)  any  pair  (X,  Y)  £  IJ  Q (  AJ  Yi,n),  n  £  I  and  1  <  n  <  F  has  higher  energy  than  any  pair 

2=1 

r  n  i 

(X',  Y')  £  12(X,  Y,  1)  fl  U  fi(Xi,  Yi,  n  -  1)  ; 

l_2=l  J 

N 

3)  at  least  one  pair  (X,  Y)  e  U  Y*,  F  + 1)  has  energy  lower  than  or  equal  to  that  of  at 

2=1 

r  at  i 

least  one  pair  (X',  Y')  £  0(X,  Y,  1)  f|  U  0(Xj,  Y,,  F)  . 

L  2=1 


Lemma  2:  Given  a  desired  training  pair  set  {(Xi,l^)}^1,  a  weight  vector  W  satisfying 
condition  (3),  for  the  associated  energy  well  hyper-radius  F,  if  we  define  VfiF  —  1  ,M)  = 
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{(X,  Y)\Hx(X,  Xi)  +  Hy(Y,  Yi)  <  F  —  1}  for  each  i,  1  <  i  <  N,  then, 

1)  any  input  pair  in  the  set  VfiF  —  1  ,M)  converges  to  the  training  pair  (X(,  Y}); 

2)  for  any  i  and  j  such  that  1  <  i  f  j  <  N,  we  have  Vt(F  —  1,  M)  f|  Vj(F  —  1,  M)  =  0; 

3)  an  upper  bound  of  the  energy  well  hyper-radius  F(M)  is 


F  = 


1 

-  min 
2 


min  HJXi,X),  min  HJYUY)\  +  1 

o <ift<N  v  ’  J  0<ift<N  VK  ’  J> 


Proof:  From  Lemma  1  and  Definition  1,  since  W  satisfying  (3),  its  associated  energy  well 
hyper-radius  F  >  1. 

1)  Kosko  [1]  has  pointed  out  that  when  a  pair  is  input  to  a  BAM,  the  network  quickly  evolves 
to  a  system  energy  local  minimum.  For  any  input  pair  in  VfiF  —  1,  M),  there  is  a  high  energy 
’’hill”  around  it.  So  it  is  guaranteed  that  BAM  evolves  to  some  pair  (X,  Y)  G  V, ( F  —  1  ,  M). 
Since  (X,t,  Yf)  is  the  only  system  energy  local  minimum,  any  input  pair  in  the  set  VfiF  —  1,  M) 
converges  to  the  training  pair  (X,:,  Y%). 

2)  For  any  1  <  i  f  j  <  N,  if  VfF  —  1,  M )  f|  Vj(F  —  1,  M)  f  0,  then  there  is  at  least  one  pair 
(. X ,  Y)  e  VfF  —  1,  M)  D  Vj(F  —  1,  M).  From  conclusion  1)  which  we  have  just  proved,  ( X ,  Y) 
converges  to  the  training  pair  {X%)  Y,)  and  (Xri  Yf.  It  implies  that  (X,,  Yf  =  ( Xj,Yj )  which 
is  inconsistent  with  the  condition  that  i  f  j.  So,  for  any  i  and  j  such  that  1  <  i  f  j  <  N, 

VfF  —  1,  M)  f]Vj(F  —  1,  M)  =  0. 

3)  From  the  conclusion  2)  that  for  any  i  and  any  j,  1  <  i  f  j  <  N,  we  have  VfF  — 
l,M)(\Vj{F  —  1 ,  M)  =  0,  then  we  obtain  F  —  1  <  |min  (Hx(Xi,X),Hy(Yi,Y))  ,  so  an 
upper  bound  for  the  energy  well  hyper-radius  is 


F  = 


1 

-  min 
2 


min  HJXi,X),  min  HJY^Y)  +1 

0 <ift<N  V  ’  '  0<i^j<N  VK  *’ 


Definition  2:  For  a  given  training  pair  set  {(Xj,li)}^.1  with  a  weight  vector  IF  and  the 

N 

associated  energy  well  hyper-radius  F  >  1,  we  define  V(M)  =  (J  VfF  —  1,  M)  as  the  noise 

2=1 

tolerance  set  of  BAM(VKM)- 

Any  pair  in  V (M)  input  to  BAM(VKM)  converges  to  the  correct  training  pair. 

We  want  to  find  the  optimal  training  weight  vector  W*  which  can  generate  a  correlation 
matrix  M*  with  the  maximum  energy  well  hyper-radius  F*  and  the  optimum  noise  tolerance 
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set  V*(M*)  D  any  V(M).  In  [3],  Wang  et  al.  just  considered  neighbors  with  one  Hamming 
distance,  corresponding  to  F  =  1,  and  V(M)  =  {{Xi,Y)}fLi-  Their  method  does  not  provide 
any  information  for  determining  a  noise  tolerance  set  V(M)  D  {(Xt,  lj)]A , • 


For  each  training  pair  (Xt,  Yt)  in  a  training  set  {(A",,  Yi)}f=l  and  M  formed  from  the  training 
set  by  equation  (1),  we  define  the  energy  of  any  neighbor 

E^ih,  k2,---,  km ;  tu  t2,  ■  ■  ■ ,  tp,  M )  =  -[X™(h,  k2,  ■  ■  ■ ,  ■■■,  tp)]T  (4) 

where, 

k2,  ■  ■  ■ ,  km),  Y?(tu  t2,  ■  •  • ,  tp))  G  Yi,  m  +  p ). 

(A’l,  k2 ,  •  •  • ,  km)  are  the  position  indices  that  the  m  bits  with  the  complementary  values  (in  bipolar 
mode,  the  complementary  value  of  -1(+1)  is  +1(-1);  in  binary  mode,  the  complementary  value 
of  1(0)  is  0  (1))  for  the  input  pattern  X, 

1  <  ki  <  Q  and  ki  ^  kj  if  1  <  i  ^  j  <  m  (5) 

while  (ti,  t2,  •  •  • ,  tp)  has  a  similar  meaning  for  the  output  pattern  Y] 

1  <  U  <  P  and  U  ^  tj  if  1  <  i  ^  j  <  p  (6) 

Also  define 

\  E3  =  ( 

<  Yt°  =  Yi  <f>(x)  =  \  '  (7) 

nn  0  ,  X<t) 

£-’°  =  E(Xi,Yi,M) 

Then,  for  a  fixed  weight  vector  W  =  (uq ,w2,  •  •  • ,  wn),  the  object  function  is  defined  as 

f(W)  =  jrEi(M)  (8) 


N 

where  Ei(M )  is  a  weighted  sum  of  energy  difference  between  any  pair  ( X ,  Y)  G  (J  -HXt,  Yi,  n ), 

2=1 

1  <  n  <  F  and  any  pair  (X' ,  Y')  G  Tl(X,  Y,  1)  fj  U  Yt,  n  —  1)  . 

L  2=1 

F  F—m 

Ei(M)  =  E  E  7 w  E  E  1,  fc2,  •  •  • ,  2i,  *2,  •  •  • ,  tp;  M )  (9) 

m=0  p=max(0,l— m)  (5)  (6) 

where, 

X)  Y  means  all  combinations  of  k\,  k2,  ■  ■  ■ ,  km  and  t\,  t2,  ■  ■  ■ ,  tp  which  satisfying  condition  (5) 

(5)  (6) 
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and  (6)  respectively. 


if  m  >  2  and  p  >  2  then, 


n  <t> 

(k,1,k'2,---,k'rn_1)c(ki,k2,-",km) 

n  </> 

if  if  m  =  1  and  p  =  1,  then, 


Er,P(h,k2,  ■■■  ,  tp ;  M) 

(  E™'p{ki,k2,  -  ■  ■  ,km]ti,t2i  -  ■  ■  ,tp]M)-  ^ 

K  *4,  •  •  • ,  A4_i;  fi,  f2, •  •  • ,  *P;  M)  , 

(  E™'p{ki,k2,  -  ■  ■  ,km-,ti,t2,  -  ■  ■  ,tp\M)~  ^ 

K  E^p~\k1M,---,km^l^---,t'p_l-M)  j 


x 


(10) 


^1(fc1;f1;M)  =  (/>(^1(fc1;f1;M)  +  XiM[Fi1(f1)]T)0(^’1(fci;fi;M)  +  ^1(A:1)MFiT) 

if  if  m  =  0  and  p  =  1,  then, 

^(‘r.  M)  =  0(  -  iwwr  -  *?'“) 

if  if  m  =  1  and  p  =  0,  then, 

a/)  =  ^(  -  -  e°-°) 

and 


’7m,p(^) 


1 


,  x  >  0 


Hm+p  i  X  <  0 

The  series  Hi  can  be  generated  by  the  following  formula, 


(11) 


Hp+1  —  1 


Hp  =  1 


(12) 


where 


Hi-i  =  NT,(Hi  +  1)(P+Q)  ,  l  =  F,F-  1,  •  ,2 

for  any  n>m>0,nG/,mG/ 


n 


ml 


sjnJ  m\{n  —  m)\ 

It  is  obvious  that  series  Hi  is  strictly  decreasing. 
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Maximum  Noise  Tolerance  Theorem :  Given  a  set  of  training  pairs  {(Xj,lO}E  and  at  least 
one  W  satisfying  the  condition  of  Lemma  1,  and  if  W*  denotes  the  W  that  maximizes  f(W), 
where  /  is  given  in  (4)  -  (12), 


W*  =  argma xf(W) 


(13) 


then, 

1)  The  BAM(fF*,M*)  has  the  maximum  energy  well  hyper-radius  1  <  F*  =  r  <  F,  where  r 
uniquely  satisfies, 


r 


^E 


2)  V*(M*) 


(°r)- 

N 

=  U  Vi(F* 


N  E  Hj(Q  +  P)  <f(W*)<Nj2(Q  +  P)-l-Hr+i  (14) 

j=r+ 1  V  3  )  i= 1  \  I  / 

—  1  ,M*)  D  any  V(M),  i.e.  for  any  F'  >  F*,  there  is  at  least  one  pair 


N 

(. X ' ,  Y')  G  U  V*  ( F'  —  1,  M )  such  that  if  it  is  input  to  BAM,  the  output  layer  will  not  converge 

2  =  1 

to  the  correct  training  pair. 

Proof:  We  divide  the  proof  into  three  parts.  The  first  one  is  to  show  that  r  uniquely  satisfies 


inequality  (14).  The  second  is  to  prove  that  F*  =  r  is  the  maximum  energy  well  hyper-radius. 


N 


The  last  one  is  to  show  that  V*(M*)  =  (J  VfF*  —  1  ,M*)  D  any  V(M). 


2=1 


Firstly,  given  a  training  weight  vector  W  and  energy  well  hyper-radius  F,  f(W )  depends 


N 


on  the  training  pair  set  Since  for  any  pair  (X,Y)  G  U  0(^,1^, n),  n  >  1  we 


2=1 


put  a  penalty  value  —Hn  on  the  object  function  if  (X,Y)  has  energy  lower  than  or  equal  to 


N 


that  of  any  neighbor  pair  (X\Yf)  G  Y,  1)  fj  U  £l(Xi,Yi,n  —  1)  and  is  Hi  a  strictly 


L2=l 


decreasing  series,  the  object  function  f(W)  takes  the  largest  value  when  only  one  neigh- 


N 


bor  pair  (X,  Y)  G  U  ^1{X^Y^F  +  1)  has  energy  lower  than  or  equal  to  that  of  one  pair 


2=1 


r  N 


(X',  Yf)  G  fi(X,  Y,  1)  fj  U  fi(Xi?  Y,  F)  .  On  the  other  hand,  when  any  neighbor  pair  (X,  Y)  G 


N 


L  2=1 


U  Q(Xi,Yi,n)9  n  >  F  +  1  has  energy  lower  than  or  equal  to  that  of  any  pair  (X7,  Y7)  G 


2  =  1 


r  N 


1)  fl  U  Q(Xi,Yi,n)  ,  /(W)takes  the  lowest  value.  So,  inequality  (14)  holds. 

li=  1  J 

It  can  be  shown  by  contradiction  that  only  one  unique  r  satisfies  the  inequality  (14). 
If  there  is  r' ,  1  <  r'  f  r  <  F  that  satisfies  inequality  (14), 


2=1 


^  +  E  hJq  +  p" 
k  1  J  ,  =G+ 1  V  j  j 


<mn<Nj:(Q^ 

i= i  V  1  / 


1  —  Hr' 


r'+l 


(15) 
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then,  1  <r'  ^r<F=>F>r'  >r  +  lorr'  +  l<r<P. 
if  F  >  r'  >  r  +  1,  from  the  right  part  of  (14), 


/(vn  =  iv^ 


2=1 

F 


=  NE 


2=1 


=  ^E 


l-iyr+1<7V^ 


'g  +  p^ 


1  —  Hr' 


2=1 


-  1  -  w  E  (Hi  +  !) 

j=r'+l 


'Q  +  PN 

v  i  y 


i-N  y:  Hj 


2=1 


j=r'+l 


'Q  +  PN 

^  J  y 


<  N't(Q+iP)  ~N  E ,H’(Q+jP) 


2=1  \  /  j=r'+l 

This  is  inconsistent  with  the  fact  that  /(IP*)  =  f(W*). 
if  r'  +  1  <  r  <  F  ,  the  right  part  of  (15) 


/on  =  iv^ 


2=1 

F 


=  NE 


F  +  p^ 


F  +  p^ 


1  -Hr,+1<NY 


2=1 


2=1 


-  1  -  w  E  W  +  !) 

J— T+l 


^  +  P\ 
v  *  y 
'g  +  p^ 

v  3  y 


1  -  Hr 


=  ^t(Q+lp)-i-N.i-,MQ+ip" 


2=1 


j=r+l 


<  iv  e 


'<2  +  P' 


2=1 


jv  E  p, 

J— T+l 


'g  +  p^ 


</(VT 


This  is  inconsistent  with  the  fact  that  /(IP*)  =  f(W*). 
Hence,  inequality  (14)  is  satisfied  by  a  unique  r. 


Secondly,  if  F*  =  r  =  F,  then  P*  is  the  maximum  energy  well  hyper-radius.  If  F*  =  r  <  F, 
then  the  conclusion  that  F*  =  r  is  the  maximum  energy  well  hyper-radius  can  be  proved  by 
contradiction  as  follows. 

If  there  is  a  (IP**,  M**)  pair,  with  the  energy  well  hyper-radius  P**  =  e,l<r<e<P,  then, 
/(IP*)  <  ^E(Q^P)  -1-Pr+1 


-1  ~He 
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while 


so, 


=  N±(Q  +  P)-He  +  N 


2=1 


j=e+l 


f(W**)>Nj2(Q+xP)-N  E 


2=1 


j=e+l 


f(W*)-f(W**)  <  Nj2(Q  +  P)-He  +  N  E  ^  + 


- 1 


2=1 


i=e+l 


J 


7vtf°+p)-jv|:  ^(e+pN 


2=1 


=  jv  w + 1) 

j=e+l 

=  -1  <  0 


j=e+l 

^g  +  p\  _ 

V  j  / 


tfe-l 


Then  we  obtain  f(W**)  >  f(W*)  which  is  inconsistent  with  equation  (13)  that  defines  W*  as 
the  optimal  solution.  So  F*  is  the  maximum  energy  well  hyper-radius. 


Thirdly,  since  F*  is  the  maximum  energy  well  hyper-radius,  for  any  F'  >  F* ,  there  is  at 


N 


least  one  neighbor  pair  (X,  1")  G  U  n(X;,Fj,n),  F*  +  1  <  n  <  F'  which  has  energy  lower 


2=1 


r  N 


than  or  equal  to  that  of  one  pair  (X',Yr)  G  Q(X,  Y,  1)  f|  U  Q(Xi,Yi,n  —  1) 


L  i=l 


.  Then  if  this 


neighbor  pair  X',  Y'  is  input  to  BAM,  the  output  pair  will  not  be  the  correct  training  pair.  Since 

U  Vi(F'-i,M)  =  U  [F{jln{xhYhj)]md{x'X)  efi(x,y;i)n[  U  n(xuY,n-i)],F*+ 

2=1  2=1  j= 0  L  2=1  J 

N 

1  <  n  <  F',  we  can  obtain  that  (X',  Y')  G  U  V,  (F'  —  1,  M).  So,  there  is  at  least  one  input  pair 


N 


2=1 


(X',  Y')  G  U  Vi(F'  —  1,  M)  ,  such  that  if  it  is  input  to  BAM,  the  network  does  not  converge  to 


2  =  1 


N 


the  correct  training  pair.  Hence,  the  optimum  tolerance  set  is  V*(M*)  =  (J  V)(F*  —  1,  M*). 

2=1 


N 

Remarks :  The  optimum  noise  tolerance  set  V*(M*)  =  (J  Vt(F*  —  1 ,  M*)  will  be  called  the 

2=1 

maximum  noise  tolerance  set.  It  is  for  a  fixed  training  pair  set.  It  is  possible  to  find  some  method, 
such  as  the  dummy  augmentation  in  [2]  to  change  the  set  of  training  pairs  to  one  with  increased 
separation  between  the  training  pairs  but  with  the  same  information  content.  Intuitively,  this  can 
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increase  the  probability  of  finding  a  larger  maximum  noise  tolerance  set  due  to  an  increased 
energy  well  hyper-radius  upper  bound. 

There  are  three  types  of  neighbors  for  BAM:  1)  the  ones  G  V*(M*)  ,  whose  output  pairs 


converge  to  t 
bound  F  = 


ie  correct  training  pairs;  2)  the  ones,  whose  deviations  are  beyond  the  upper 

whose  output  pairs  will 


|  min 


(  min  HJXi.X),  min  Hv(YhY)]  +  1 

Vo  <i&<N  xy  *’  ;,0  <&j<N  yK  ’  V 

not  converge  to  correct  training  pairs;  3)  others  that  may  or  may  not  be  recalled  correctly. 

Since  our  approach  is  based  on  the  energy  surface,  using  different  energy  definitions,  it  can 
be  applied  to  obtain  max  noise  tolerance  sets  for  the  higher  capacity  BAM  [9]- [12]  rather  than 
the  basic  BAM  only. 


IV.  Computer  Simulations 

A  numerical  example  is  given  in  this  section  to  evaluate  the  performance  of  the  extended  BAM 
with  optimized  training  weights.  Suppose  one  wants  to  distinguish  three  pattern  pairs  shown  in 
Fig.  2.  X1  =  (-1,-1, -1,-1, -1,-1, -1,1, -1,-1, -1,1,1, 1,-1, -1,-1, 1,-1, -1,-1, -1,-1, -1,-1) 


Fig.  2.  Three  Training  Pairs 


Yx  =  (-1,-1, -1,-1, -1,-1, -1,-1, -1,-1, -1,1,1, 1,-1, -1,-1, -1,-1, -1,-1, -1,-1, -1,-1) 
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X2  =  (1,1,-1, -1,-1, 1,1, -1,-1, -1,-1, -1,1, -1,-1, -1,-1, -1,1,1, -1,-1, -1,1,1) 

Y2  =  (-1,1, 1,1, -1, -1,-1, -1,-1, 

x3  =  (1, -1,-1, -1,1, -1,-1, -1,-1, -1,-1, -1,1, -1,-1, -1,-1, -1,-1, -1,1, -1,-1, -1,1) 
y3  =  (i,u,u,-i,-u,-i,-u,-u,-u,-i,-u,-i,-u,u,u) 


So, 

Hx(X1,X2)  =  12,  Hy(Y1,Y2)  =  8 
Hx(Xi,  X3)  =  8,  Hy(Yi,  Y3)  =  16 
Hx(X2,X3)  =  8,Hy(Y2,Y3)  =  8 
F  =  8/2  +  1  =  5 

In  this  example,  to  find  the  optimum  training  weights,  the  objective  function  defined  in  equation 
(8)  is  used  as  the  fitness  function  of  Genetic  Algorithm  (GA).  The  results  obtained  from  GA 
are  optimal  with  high  probability.  This  is  acceptable  in  real  applications. 


16 

15 

14 

13 

<2  12 
cn 

|  11 

,|  10 
CT3 

H=  9 
8$ 
7  ir 

6* 

5 


0-©~©~0 


f~+  / 

$-4 


training  weight  w1 
— <y-  training  weight  w2 
training  weight  w 


8  10  12 
Generations 


Fig.  3.  Fitness  Plot  and  Training  Weights 

W*  =  (wl,  w^,wl)  =  (14, 14, 15),  and  F*  =  2.  All  training  pairs  have  been  recalled  correctly 
and  all  noisy  input  pairs  with  one  Hamming  distance  away  from  the  training  pairs  have  converged 
to  the  correct  training  pair. 


V.  Conclusion 

We  extended  the  Basic  BAM,  using  an  optimization-based  training  strategy.  For  a  given  set  of 
training  pairs,  we  determined  the  weights  for  the  training  pairs  in  the  BAM  correlation  matrix 


June  1,  2003 


15 


that  result  in  the  maximum  noise  tolerance  set.  Any  noisy  input  pair  within  the  tolerance  set 
will  converge  to  the  correct  training  pair.  We  proved  that  for  a  given  set  of  training  pairs,  the 
maximum  noise  tolerance  set  is  the  largest  in  the  sense  that  at  least  one  pair,  with  Hamming 
distance  one  larger  than  the  hyper  radius  associated  with  the  optimum  noise  tolerance  set,  will  not 
converge  to  the  correct  training  pair.  A  standard  Genetic  Algorithm  (GA)  was  used  to  calculate 
the  weights  to  maximize  the  object  function. 

For  BAM  applications,  the  speed  of  encoding  is  relatively  less  important  than  that  of  the 
decoding  because  the  encoding  can  be  calculated  offline.  However,  if  adaptive  encoding  is  needed 
to  apply  to  some  new  desired  pairs  in  real  time  simulation,  the  training  time  should  be  as  short  as 
possible.  In  the  example  of  this  paper,  a  standard  GA  algorithm  was  used.  This  GA  worked  well 
but  performed  relatively  inefficiently,  as  calculation  times  were  quite  long  with  many  generations 
and  fitness  values  needed  to  find  the  optimal  solution.  Improving  the  performance  of  the  BAM 
weight  optimization  is  another  future  research  direction. 
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